Loading required package: Signac
The joint analysis of multiple single-cell datasets is achieved by identifying cross-dataset pairs of cells that have shared cell populations, named as anchors. This helps address two problems:
#Data normalization and variable feature selection
lung.list <- SplitObject(lung, split.by = "orig.ident")
lung.list <- lapply(X = lung.list, FUN = function(x) {
x <- NormalizeData(x, verbose = FALSE)
x <- FindVariableFeatures(x, verbose = FALSE)
})
#select integration features
features <- SelectIntegrationFeatures(object.list = lung.list)
lung.list <- lapply(X = lung.list, FUN = function(x) {
x <- ScaleData(x, features = features, verbose = FALSE)
x <- RunPCA(x, features = features, verbose = FALSE)
})
#iteratively find anchors that link cells across two datasets
anchors <- FindIntegrationAnchors(object.list = lung.list, reference = c(1, 2), reduction = "rpca",
dims = 1:20)#1:50
lung.integrated <- IntegrateData(anchorset = anchors, dims = 1:20)
With the the integration approach, we expect technical differences such as batch effects should be corrected. To confirm that, we check the clustering results for the same tissue across batches. When split by tissue, the top plot shows the same tissue samples from different batches look well mixed. When split by tissue and batch, the bottom clustering results show no distinct clusters coming from one batch or one donor.
Graph-based clustering
p2
Major cell types
T cell subsets
immune.combined
Loading required package: Signac
An object of class Seurat 321136 features across 53647 samples within 4 assays Active assay: RNA (36601 features, 0 variable features) 3 other assays present: ATAC, integrated, peaks 3 dimensional reductions calculated: pca, umap, lsi
Differential expression test was performed on normalized values stored in immune.combine[["RNA"]]@data. Here are the normalization steps:
Volcano plots and QQ plots comparing against null
p1 + p2
eg = bitr(rownames(sig.genes), fromType = "SYMBOL", toType="ENTREZID", OrgDb = "org.Hs.eg.db")
all.eg = bitr(rownames(DEG.test), fromType = "SYMBOL", toType="ENTREZID", OrgDb = "org.Hs.eg.db")
go_list <- enrichGO(gene = eg$ENTREZID,
universe = names(all.eg$ENTREZID),
OrgDb = org.Hs.eg.db,
ont = "BP",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05,
readable = TRUE)
library(forcats)
library(enrichplot)
go_list<-mutate(go_list, richFactor = Count / as.numeric(sub("/\\d+", "", BgRatio)))
ggplot(go_list, showCategory = 20,
aes(richFactor, fct_reorder(Description, richFactor))) +
geom_segment(aes(xend=0, yend = Description)) +
geom_point(aes(color=qvalue, size = Count)) +
scale_color_viridis_c(guide=guide_colorbar(reverse=TRUE)) +
scale_size_continuous(range=c(2, 10)) +
theme_minimal() + theme(axis.text.y=element_text(size=12)) +
xlab("rich factor") +
ylab(NULL)
Color scheme:
Red: negative log2FC (Upregulated in lungs)
blue: positive log2FC (Upregulated in spleens)
grey: data points not passing bonferroni corrected cutoff 0.05
DE results for major cell types with MAST
nonT<-marrangeGrob(volcano_plots[c(1,2,3,7,9,12)], nrow=2, ncol=3)
nonT
[[1]] NULL
DE results for major cell types with Wilcox ranksum test
#wilcoxon test
nonT<-marrangeGrob(volcano_plots[c(1,2,3,7,9,12)], nrow=2, ncol=3)
nonT
[[1]] NULL
DE results for T cell susbets with MAST
[[1]] NULL
DE results for T cell susbets with Wilcoxon rank sum test
T<-marrangeGrob(volcano_plots[c(4,5,6,10)], nrow=2, ncol=3)
T
[[1]] NULL
ggplot(counts.tbl,
aes(x = c, y=n,
fill=labels, label=n,
width=0.5)) +
geom_col(position="dodge2") + geom_label(size=5,position = position_dodge2(width=0.8)) +
theme_light(base_size = 15) + labs(y="DE genes") + ggtitle("DE with Wilcoxon rank sum test") +
scale_fill_discrete(labels=c("Up-regulated in spleens",
"Up-regulated in lungs")) +
theme(axis.text.x=element_text(size=16, vjust=0.6, angle = 45))
Warning message: “ggrepel: 167 unlabeled data points (too many overlaps). Consider increasing max.overlaps”
Warning message: “ggrepel: 94 unlabeled data points (too many overlaps). Consider increasing max.overlaps”
Warning message: “ggrepel: 88 unlabeled data points (too many overlaps). Consider increasing max.overlaps”
Warning message: “ggrepel: 237 unlabeled data points (too many overlaps). Consider increasing max.overlaps”
## wilcoxon ranksum test
marrangeGrob(vplots, nrow=2, ncol=2)
Warning message: “ggrepel: 224 unlabeled data points (too many overlaps). Consider increasing max.overlaps” Warning message: “ggrepel: 277 unlabeled data points (too many overlaps). Consider increasing max.overlaps” Warning message: “ggrepel: 86 unlabeled data points (too many overlaps). Consider increasing max.overlaps” Warning message: “ggrepel: 49 unlabeled data points (too many overlaps). Consider increasing max.overlaps”
[[1]] NULL
Warning message: “ggrepel: 221 unlabeled data points (too many overlaps). Consider increasing max.overlaps” Warning message: “ggrepel: 295 unlabeled data points (too many overlaps). Consider increasing max.overlaps” Warning message: “ggrepel: 63 unlabeled data points (too many overlaps). Consider increasing max.overlaps” Warning message: “ggrepel: 39 unlabeled data points (too many overlaps). Consider increasing max.overlaps”
[[1]] NULL
p1+p2
Identify the known pathways that are enriched with the differentially expressed genes in lungs
TO DO: Use WebGesalt to remove redundant GO terms
[[1]] NULL
[[1]] NULL
[[1]] NULL